A Survey on Statistical Approaches to Natural Language Processing
نویسندگان
چکیده
This survey attempts to catch up with the recent increasing interests in statistical approach to natural language processing based on large corpora. First of all, a historical overview traces back to 1950s when Noam Chomsky proposed his phrase structure transformation grammar and rejected the Markov process natural language modeling. With the development of large corpora and language modeling in recent years, the statistical approach to natural language processing is revived and gains more attention among computational linguists. This survey rst addresses the most successful statistic approach on the part-of-speech tagging by using the hidden Markov model (HMM) and dynamic programming. It then brieey introduces the self-organized method that estimates the parameters of a language model. This is followed by the various statistic estimation methods and the related probability theory. Finally, the corpus-based approach on syntactic structure together with the statistical machine translation are presented. To conclude, we present a brief discussion of the future trends in statistical approaches to natural language processing. 1. Historial overview With the recent increasing interest in statistical approaches to natural language processing, corpus linguistics has become a hot topic. This is due to the fact that Texts are more available than ever before and easier to use for various data tasks. The success of part-of-speech tagging by using the Hidden Markov Model (HMM) also draws the attention of computational linguists to lexical analysis, language modeling and machine translation by various statistical methods. The 1950s style of empiricism is back into fashion.
منابع مشابه
روش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملLinguistic Structure Prediction
A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between ...
متن کاملSurvey: Finite-state technology in natural language processing
In this survey, we will discuss current uses of finite-state information in several statistical natural language processing tasks. To this end, we will review standard approaches in tokenization, part-of-speech tagging, and parsing, and illustrate the utility of finite-state information and technology in these areas. The particular problems were chosen to allow a natural progression from simple...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملUsing Generalized Language Model for Question Matching
Question and answering service is one of the popular services in the World Wide Web. The main goal of these services is to finding the best answer for user's input question as quick as possible. In order to achieve this aim, most of these use new techniques foe question matching. . We have a lot of question and answering services in Persian web, so it seems that developing a question matching m...
متن کامل